K8s 应用存活和容器启动结束钩子

Pod 正常里面的 Docker 服务不一定正常。Docker 服务正常,Docker 里面的服务不一定正常。所以如何正确的监测这些状态,成为了应用健康很重要的关键。 livenessProbe, 用来判定容器是否正常。readinessProbe 用来判定容器中的服务是否正常。这两种探测非常重要,一定要利用探测来证明容器正常后才能接入 Service。不然用户可能会访问失败。同时设置 readinessProbe 有助于在滚动更新时候判断容器中服务的状态,保证应用能提供健康的服务。livenessProbe,readinessProbe 和 postStart,preStop 都支持三种方式的探测,分别是 exec 执行系统命令,tcp socket 和 http get 请求。

livenessProbe

1
kubectl explain pods.spec.containers.livenessProbe

livenessProbe 支持三种存活状态的检测,分别是 tcp,exec,http get。下面演示两种

exec 存活探测

创建一个 yaml 文件,内容如下:

1
vim liveness-exec.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
apiVersion: v1
kind: Pod
metadata:
name: liveness-exec-pod
namespace: default
spec:
containers:
- name: liveness-exec-container
image: busybox:latest
imagePullPolicy: IfNotPresent # 镜像拉取规则,此处为不存在才拉取
command: ["/bin/sh", "-c", "touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 3600"] # 执行命令,先创建一个 healthy 文件,睡 30s 后进行删除,然后睡 3600s
livenessProbe: # 容器健康检查探测,用来判定容器是否正常。还有一个是 readiness 用来判定容器中的服务是否正常
exec: # 检查方式为执行命令。另外还支持 TCP socket 探测和 HTTP GET 探测。
command: ["test", "-e", "/tmp/healthy"]
initialDelaySeconds: 1 # 默认为 0s,表示容器启动后多长时间开启健康监测
periodSeconds: 3 # 默认为 10s,表示每隔多少时间进行一次探测
failureThreshold: 3 # 默认为3次,意思是3次失败才代表失败
successThreshold: # 默认为1次,意思是1次成功就代表成功
timeoutSeconds: 1 # 超时时间,默认为1s

上面的 Pod 创建后,就会创建 /tmp/healthy 文件,并且睡 30s,之后被删除。健康检查的内容是容器启动1s后判断 /tmp/healthy 文件是否存在,且每隔10s进行一次探测,失败3次即认为失败。健康检查失败后就会进行重新启动。下面是 pod 的列表信息,可以看到重启的次数。

1
2
3
[root@k8s001 rexyan]# kubectl get pods 
NAME READY STATUS RESTARTS AGE
liveness-exec-pod 1/1 Running 5 6m17s

查看详细信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
[root@k8s001 rexyan]# kubectl describe pods liveness-exec-pod
Name: liveness-exec-pod
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: k8s002/172.20.245.189
Start Time: Sun, 19 May 2019 16:05:01 +0800
Labels: <none>
Annotations: <none>
Status: Running
IP: 10.244.2.2
Containers:
liveness-exec-container:
Container ID: docker://b6d08991993bb306f32b58f7bcc71651ac2b68d1021a05634bcae6832bbbe169
Image: busybox:latest
Image ID: docker-pullable://docker.io/busybox@sha256:4b6ad3a68d34da29bf7c8ccb5d355ba8b4babcad1f99798204e7abb43e54ee3d
Port: <none>
Host Port: <none>
Command:
/bin/sh
-c
touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 3600
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Sun, 19 May 2019 16:10:48 +0800
Finished: Sun, 19 May 2019 16:11:57 +0800
Ready: False
Restart Count: 5
Liveness: exec [test -e /tmp/healthy] delay=1s timeout=1s period=3s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-vckdx (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-vckdx:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-vckdx
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m16s default-scheduler Successfully assigned default/liveness-exec-pod to k8s002
Normal Pulling 7m16s kubelet, k8s002 Pulling image "busybox:latest"
Normal Pulled 7m14s kubelet, k8s002 Successfully pulled image "busybox:latest"
Normal Killing 4m17s (x3 over 6m35s) kubelet, k8s002 Container liveness-exec-container failed liveness probe, will be restarted
Normal Created 3m47s (x4 over 7m14s) kubelet, k8s00 2 Created container liveness-exec-container
Normal Started 3m47s (x4 over 7m13s) kubelet, k8s002 Started container liveness-exec-container
Normal Pulled 3m47s (x3 over 6m5s) kubelet, k8s002 Container image "busybox:latest" already present on machine
Warning Unhealthy 2m5s (x13 over 6m41s) kubelet, k8s002 Liveness probe failed:

在 Containers 中可以看到刚才配置的健康检查的信息

1
2
Restart Count:  5
Liveness: exec [test -e /tmp/healthy] delay=1s timeout=1s period=3s #success=1 #failure=3

http get 存活探测

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
apiVersion: v1
kind: Pod
metadata:
name: liveness-http-pod
namespace: default
spec:
containers:
- name: liveness-http-get-container
image: ikubernetes/myapp:v1
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 80
livenessProbe:
httpGet:
port: http
path: /index.html
initialDelaySeconds: 1
periodSeconds: 3
failureThreshold: 3
successThreshold: 1
timeoutSeconds: 1

查看容器状态

1
2
3
4
[root@k8s001 rexyan]# kubectl get pods 
NAME READY STATUS RESTARTS AGE
liveness-exec-pod 0/1 CrashLoopBackOff 9 23m
liveness-http-pod 1/1 Running 0 104s

查看详细信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
[root@k8s001 rexyan]# kubectl describe pods liveness-http-pod
Name: liveness-http-pod
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: k8s003/172.20.245.191
Start Time: Sun, 19 May 2019 16:27:15 +0800
Labels: <none>
Annotations: <none>
Status: Running
IP: 10.244.1.3
Containers:
liveness-http-get-container:
Container ID: docker://9cb65d175dc8263f54891b597e3a5f4a334f20c4ab636d532887cabfeb7cff3c
Image: ikubernetes/myapp:v1
Image ID: docker-pullable://docker.io/ikubernetes/myapp@sha256:9c3dc30b5219788b2b8a4b065f548b922a34479577befb54b03330999d30d513
Port: 80/TCP
Host Port: 0/TCP
State: Running
Started: Sun, 19 May 2019 16:27:18 +0800
Ready: True
Restart Count: 0
Liveness: http-get http://:http/index.html delay=1s timeout=1s period=3s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-vckdx (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-vckdx:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-vckdx
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m58s default-scheduler Successfully assigned default/liveness-http-pod to k8s003
Normal Pulling 2m58s kubelet, k8s003 Pulling image "ikubernetes/myapp:v1"
Normal Pulled 2m55s kubelet, k8s003 Successfully pulled image "ikubernetes/myapp:v1"
Normal Created 2m55s kubelet, k8s003 Created container liveness-http-get-container
Normal Started 2m55s kubelet, k8s003 Started container liveness-http-get-container
[root@k8s001 rexyan]#

在 Containers 中可以看到刚才配置的健康检查的信息

1
2
Restart Count:  0
Liveness: http-get http://:http/index.html delay=1s timeout=1s period=3s #success=1 #failure=3

现在手动进入容器,删除健康检查的 index.html 页面

1
2
3
4
[root@k8s001 rexyan]# kubectl get pods 
NAME READY STATUS RESTARTS AGE
liveness-exec-pod 0/1 CrashLoopBackOff 11 28m
liveness-http-pod 1/1 Running 0 6m4s
1
2
[root@k8s001 rexyan]# kubectl exec -it liveness-http-pod -- /bin/sh 
/ # rm -f /usr/share/nginx/html/index.html

再次看 pod 的状态就会发现 pod 已经重启了一次,重启之后删除的文件就回来了,所以就不会再重启了。

1
2
3
4
[root@k8s001 rexyan]# kubectl get pods 
NAME READY STATUS RESTARTS AGE
liveness-exec-pod 0/1 CrashLoopBackOff 11 30m
liveness-http-pod 1/1 Running 1 8m12s

redinessProbe

1
kubectl explain pods.spec.containers.readinessProbe

redinessProbe 也支持三种存活状态的检测,分别是 tcp,exec,http get,下面演示一种。

http get 存活探测

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
apiVersion: v1
kind: Pod
metadata:
name: readiness-http-pod
namespace: default
spec:
containers:
- name: readiness-http-get-container
image: ikubernetes/myapp:v1
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 80
readinessProbe:
httpGet:
port: http
path: /index.html
initialDelaySeconds: 1
periodSeconds: 3
failureThreshold: 3
successThreshold: 1
timeoutSeconds: 1
1
2
[root@k8s001 rexyan]# kubectl create -f readiness-http-get.yaml 
pod/readiness-http-pod created
1
2
3
4
[root@k8s001 rexyan]# kubectl get pods 
NAME READY STATUS RESTARTS AGE
liveness-http-pod 1/1 Running 1 26m
readiness-http-pod 1/1 Running 0 5s

之后进入容器删除 index.html

1
2
[root@k8s001 rexyan]# kubectl exec -it readiness-http-pod -- /bin/sh 
/ # rm -f /usr/share/nginx/html/index.html

查看 pod 的信息, 可以看到 readiness-http-pod READY 个数变成了 0。READY 中 / 前面是值表示 pod 中容器就绪的数量,后面的是 pod 中容器的总个数。

1
2
3
4
[root@k8s001 rexyan]# kubectl get pods 
NAME READY STATUS RESTARTS AGE
liveness-http-pod 1/1 Running 1 30m
readiness-http-pod 0/1 Running 0 3m43s

进入容器,重新写信息到 nginx 的 index 文件中

1
2
[root@k8s001 rexyan]# kubectl exec -it readiness-http-pod -- /bin/sh 
/ # echo "hi k8s" >> /usr/share/nginx/html/index.html

重新查看 pod 的信息,就可以看到 pod 的 READY 状态已经从 0 变成1了

1
2
3
4
[root@k8s001 rexyan]# kubectl get pods 
NAME READY STATUS RESTARTS AGE
liveness-http-pod 1/1 Running 1 38m
readiness-http-pod 1/1 Running 0 11m

查看详细的 pod 信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
[root@k8s001 rexyan]# kubectl describe pods readiness-http-pod 
Name: readiness-http-pod
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: k8s002/172.20.245.189
Start Time: Sun, 19 May 2019 16:54:04 +0800
Labels: <none>
Annotations: <none>
Status: Running
IP: 10.244.2.3
Containers:
readiness-http-get-container:
Container ID: docker://2989185e07600a552f6a57ecc3e813156002e2218701da07da8b2efbfaf7c966
Image: ikubernetes/myapp:v1
Image ID: docker-pullable://docker.io/ikubernetes/myapp@sha256:9c3dc30b5219788b2b8a4b065f548b922a34479577befb54b03330999d30d513
Port: 80/TCP
Host Port: 0/TCP
State: Running
Started: Sun, 19 May 2019 16:54:07 +0800
Ready: True
Restart Count: 0
Readiness: http-get http://:http/index.html delay=1s timeout=1s period=3s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-vckdx (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-vckdx:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-vckdx
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 14m default-scheduler Successfully assigned default/readiness-http-pod to k8s002
Normal Pulling 14m kubelet, k8s002 Pulling image "ikubernetes/myapp:v1"
Normal Pulled 14m kubelet, k8s002 Successfully pulled image "ikubernetes/myapp:v1"
Normal Created 14m kubelet, k8s002 Created container readiness-http-get-container
Normal Started 14m kubelet, k8s002 Started container readiness-http-get-container
Warning Unhealthy 4m4s (x134 over 10m) kubelet, k8s002 Readiness probe failed: HTTP probe failed with statuscode: 404
[root@k8s001 rexyan]#

在 Containers 中可以看到刚才配置的健康检查的信息

1
2
Restart Count:  0
Readiness: http-get http://:http/index.html delay=1s timeout=1s period=3s #success=1 #failure=3

容器启动和结束钩子

在容器启动后和结束前都有对应的钩子,分别是 postStart 和 preStop

postStart

1
kubectl explain pods.spec.containers.lifecycle.postStart

postStart 有三种执行方式,分别是tcp,exec 和 http get。

preStop

1
kubectl explain pods.spec.containers.lifecycle.preStop

preStop 也有三种执行方式,分别是tcp,exec 和 http get